Spatial-Temporal Data Mining in Wireless
Sensor Networks
V. K. Patle
School of
Studies in Computer Science and I.T., Pt. Ravishankar Shukla University Raipur
C.G.
*Corresponding Author E-mail:
patlevinod@gmail.com
ABSTRACT:
Data mining techniques are use to discover a meaningful knowledge
from data set. These mining techniques have been applied to intrusion
detection, customized marketing, web personalization and many real-life
problems. Knowledge discovery from sensor data is an emerging research area
because of its variety of applications are presents for our society. Wireless
Sensor Networks (WSN), produce large scale of data in the form of streams.
Spatiotemporal Mining in the sensor data provides useful information for
different applications.
KEYWORDS: WSN, Spatial-Temporal Data Mining.
1 INTRODUCTION:
Sensor networks
are found in increasing number of applications in many areas, including battle
fields, smart buildings, and even the human body. Most sensor networks consist
of a collection of light-weight (possibly mobile) sensors connected via
wireless links to each other or to a more powerful gateway node that is in turn
connected with an external network through either wired or wireless
connections. Sensor nodes usually communicate in a peer-to-peer architecture
over an asynchronous network. In many applications, sensors are deployed in
hostile and difficult to access locations with constraints on weight, power
supply, and cost. Moreover, sensors must process a continuous (possibly fast)
stream of data. Data mining in wireless sensor networks (WSNs) is a challenging
area, as algorithms need to work in extremely demanding and constrained
environment of sensor networks (such as
bounded energy, storage, bandwidth, and computational power). WSNs also
require highly decentralized algorithms [1, 2].
Development of
algorithms that take into consideration the characteristics of sensor networks,
such as energy and computation constraints, network dynamics, and faults,
constitute an area of current research. Some work has been done in developing
localized, collaborative, and distributed and self-configuration mechanisms in
sensor networks.
In designing
algorithms for sensor networks, it is imperative to keep in mind that power
consumption has to be minimized. Even gathering the distributed sensor data in
a single site could be expensive in terms of battery power consumed, some
attempts have been made towards balancing the energy-quality trade-offs and
making the data collection task energy efficient. An important optimization
problem is clustering the nodes of the sensor networks. Nodes can easily
communicate with each other that are clustered together, which can be worked in
energy optimization and developing optimal algorithms for clustering sensor
nodes. Some other works in this field include finding frequent item sets,
identification of rare events or anomalies, and data preprocessing in sensor
networks [2].
After a short
introduction of data mining, this paper presents spatial and temporal data
mining techniques with reference to wireless sensor networks including recently
related works done in this area of research.
2. DATA
MINING DEFINITION:
Data mining
is a powerful technology with great potential to help academics or industries
to focus on and only the most important information in their data warehouse.
Two primary goals of data mining tend to be prediction and description.
Prediction involves to predict unknown or future
values of other variables of interest using some variables (or) fields in the
data set. On the other hand description emphasis on finding patterns and describing the
data that can be interpreted by humans. In fig[1] we
present a short architecture of data mining process. At first data set from
whom data to be mined filtered useful data by domain specific wrapper and then
only these useful data may be mined by spatial-temporal mining engine and
finally resultant data is available for the end users [3, 4].
Fig.1 Data
Mining Architecture
3. TEMPORAL DATA MINING:
Temporal data
mining [4, 5] related to
the analysis of events ordered by one or more dimensions of time.
We differentiate between two broad directions. One related to the discovery of
causal relationships among temporally-oriented events. The other related to the
discovery of similar patterns within the same time sequence or among different
time sequences. This latter area, commonly known as time series analysis (or
trend analysis) focuses on the identification of similar pre-specified
patterns.
3.1 Mining of
Temporal Sequences:
The goal of
temporal data mining is to discover hidden relations between sequences and
subsequences of events. The representation and modeling of the data sequence in
a suitable form; the definition of similarity measures between sequences; and
the application of models and representations to the actual mining problems are
the tree steps that are involved in the discovery of relations between
sequences of events. A sequence composed by a series of nominal symbols from a
particular alphabet is usually called a temporal sequence and a sequence of
continuous, real-valued elements, is known as a time series.
3.2 Temporal
Sequences Representation:
3.2.1.
Time-Domain Continuous Representations:
An easy approach
to represent a sequence of real-valued elements (time series) is to use the
initial elements, ordered by their instant of occurrence without any
preprocessing. An alternate is to find a piecewise linear function that able to
describe the entire initial sequence approximately. The objective is to acquire
a representation amenable to the detection of significant changes in the
sequence.
3.2.2.
Transformation Based Representations:
The basic idea
of Transformation Based Representations is to transform the initial sequences
from time to another domain, and after that to use a point in this new domain
to represent each original sequence. The Discrete Fourier Transform (DFT) is used by one
proposal to transform a sequence from the time domain to a point in the
frequency domain
The Discrete
Wavelet Transform (DWT) is used by an more recent
approach to translate each sequence from the time domain into the time /
frequency domain. The DWT decomposes the original sequence into different
frequency components, without loosing the information
about the instant of the elements occurrence. It is a linear transformation.
3.3 Temporal
Data Mining Tasks:
Data mining has
been used in a wide range of applications. Temporal data mining tasks may be
grouped as follows:
(i) Prediction,
(ii) Classification,
(iii) Clustering,
(iv)
Search and retrieval
and
(v) Pattern discovery.
The task of
time-series prediction has to do with forecasting (typically) future values of
the time series based on its past samples. In order to do this, one needs to
build a predictive model for the data.
4. SPATIAL
DATA MINING:
The main
difference between data mining in relational DBS and in spatial DBS is that
attributes of the neighbors of some object of interest may have an influence on
the object and therefore have to be considered as well. The explicit location
and extension of spatial objects define implicit relations of spatial
neighborhood which are used by spatial data mining algorithms[5].
4.1 Database
Primitives for Spatial Data Mining:
The set of
database primitives for mining in spatial databases which are sufficient to
express most of the algorithms for spatial data mining and which can be
efficiently supported by a DBMS.
4.2 Efficient
DBMS Support:
Effective
filters allow restricting the search to such neighborhood paths “leading away”
from a starting object. Neighborhood indices materialize certain neighborhood
graphs to support efficient processing of the database primitives by a DBMS.
5.
SPATIAL-TEMPORAL DATA MINING:
Spatiotemporal
data mining refers to the extraction of implicit knowledge, spatial- temporal
relationships or other patterns not explicitly retained in spatial-temporal
databases.
5.1 Spatialization and Temporalization of Data
Mining Techniques:
Spatial-temporal
data mining depicts the confluence of several fields including spatio-temporal databases, statistics, machine learning,
information theory and geographic visualization. First of all, spatial and
temporal relationships exist among spatial entities at various levels (scales).
Both metric (such as distance) and nonmetric (such as
topology, directions, shape, etc.) spatial relations, and temporal relations (such as before
or after) may be explicit or implicit in the geographic databases. Second,
spatial and temporal dependency and heterogeneity are intrinsic characteristic
of spatiotemporal databases. Third, scale effect in space and time is a
challenging research issue in geographic analysis [6].
5.2 The
Spatial-Temporal Data-Mining Process:
The data-mining
process usually consists of three steps:
(1) pre-processing;
(2) modeling and validation; and
(3)post-processing
The data may
need some cleaning and transformation according to some constraints imposed by
some tools, algorithms, or users during the first phase. The second phase consists
of choosing or building a model that better reflects the application behavior.
And at last, the third step consists of using the model, evaluated and
validated in the second step to effectively study the application behavior.
5.3. Spatial-Temporal
Data Representation and Infrastructure:
In a review of
temporal knowledge discovery, four broad categories of temporality within data
are classified :
•
Static
•
Sequences
•
Time stamped
•
Fully
temporal
6. WSN CHALLENGES:
There are various challenges
faced by WSNs [2, 7]. Some of them are stated below:
6.1 Real-Time:
Sometime
it is necessary to deliver the data within given time or deadline. Not all the
protocol developed for WSNs provide real time requirements. So developing real
time protocol is a challenge for WSNs.
6.2 Power/ Energy Management:
A
large amount of energy is consumed during communication among the nodes. Sensor
should not deplete with battery for monitoring the critical areas. So, multiple
sensors should be deployed in such areas instead of using single sensor.
6.3
Coverage Problems:
One of the
fundamental issues that arise in sensor networks, in addition to location
calculation, tracking, and deployment, is coverage. Coverage is subject to a
wide range of interpretations [8], due to the large variety of sensors and
applications,.
6.4 Security:
To
achieve security in sensor network, security must be integrated in every single
component of the system. One of the main challenges is how to secure a wireless
network from eavesdropping [9].
6.5 Anomaly:
Sensor
node gathers data and there is high possibility of corruption of that data. The
main focus of this survey is on this challenge.
7. RELATED
WORK:
Data Mining is
an essential step in the knowledge discovery process that is concerned with
extracting hidden knowledge from vast amounts of data using techniques inspired
by different disciplines, such as databases, machine learning, artificial
intelligence, and statistics [10]. Recently, data mining techniques have been
used to extract patterns about data collected from a WSN. These kinds of
patterns are usually used to gain insight into the phenomena under monitoring.
These patterns can be also used to improve the performance of the network.
In [11], authors
have given an overview of the wireless sensor networks and their challenges.
One of challenge that is anomaly detection being a recent area of research as
used for mining the sensor data is surveyed. Various types of the anomalies
that can present in wireless sensor network are briefly explained to provide an
overview. they also introduce the architecture used
for anomaly detection and brief introduction to techniques.
Another paper
[12] proposes possible adaptive methodology like ART model and PCA technique
[13] to mine data in large sensor networks. Author presented that the structure
of the processing architecture of a sensor network must be taken into account
for data mining task. Data clustering algorithms for data spread over a sensor
network are necessary in many applications based on sensor-networks. The use of
limited resources together with the distributed nature of the sensor networks
demands a fundamentally distributed algorithmic solution for data clustering.
According to the sensing task like classification or prediction the organization
of the sensor network may change, thus the accuracy and quality of the data
mining task must be taken in to account.
In [14] author
studied that geographical spatial-temporal correlations are evaluated
respectively with the methods of geostatistics
interpolation, wavelet data decomposition, fuzzy c-means clustering, and Apriori-based logical rules extraction. They only consider
a geographical transaction as a composition of features of space, time, air
temperature, precipitation, and vegetation, or as a five-dimensional
geographical object.
In [15] paper,
authors proposed a framework for spatiotemporal knowledge discovery that
supports the development of new kinds of knowledge such as the spatiotemporal
moving pattern. They discuss that proposed framework is possible to represent
the definition and relationships of spatiotemporal data sets and knowledge by
using a foundation model for knowledge discovery. Authors evaluate the
characteristics of the proposed framework and present some of the related
problems.
In paper [16],
authors presented a comparative study of classification techniques, J48(Decision Tree), Naive Bayes,
and ZeroR,
with labeled data in wireless sensor network. Data was obtained from Labelled Wireless Sensor Network Data Repository (LWSNDR).
The data consisted of humidity and temperature measurements collected during 6
hour period at interval of 5 seconds. Label ‘0’ denotes normal data and label
‘1’ denotes an introduced event. The
performance of these three techniques was tried to show in terms of Summary of
accuracy, Classifier Error, Confusion Matrix. By the experiments, it was found
that Naďve Bayes algorithm is more suitable
for the used dataset to reduce the data transmission in WSN effectively and to
implement classification simply. It was concluded that by classifying the large
dataset at the sensor nodes level, normal values can be discarded and transmit
only the anomaly values to the central server.
8. CONCLUSION AND FUTURE WORK:
For Spatio-Temporal data sets, Spatio-Temporal
data mining is necessary to extract knowledge and information. By applying
different algorithms for different data mining techniques, we can choose one
suitable algorithm for given dataset to extract knowledge. These
knowledge can be used for different purposes such as transmission of data can
be reduced by sending only required data to the central server.
In future we can
apply mining on WSN data to increase life of sensors by getting specific
patterns in the temporal sequences and geographical data.
9. ACKNOWLEDGMENTS:
The author is thankful to University Grant Commission, New Delhi, India
for Minor Research Project. The author is also thankful to Pt. Ravishankar
Shukla University, Raipur India for their resources, hosting and necessary
support.
10. REFERENCES:
[1] Z. Obradovic, D. Das, V. Radosavljevic,
K. Ristovski, S. Vucetic, “Spatio-Temporal Characterization of Aerosols Through Active
Use of Data from Multiple Sensors”, ISPRS TC VII Symposium, Vienna, Austria, July
5–7, 2010, IAPRS, vol. xxxviii, part 7b, pp424-429.
[2] Jiawei Han and Jing Gao,
“Research Challenges for Data Mining in Science and Engineering”, University of
Illinois at Urbana-Champaign.
[3]
R. Geetha, N Sumathi and
Dr. S. Sathiyabama, “A survey of spatial, temporal
and spatio temporal data mining”, journal of computer
applications, vol – 1, no.4, Oct – Dec 2008, pp
31-33.
[4] S. P. Deshpande1
and V. M. Thakare, “Data Mining System and
Applications: A Review”, International Journal of Distributed and Parallel
Systems (IJDPS) vol.1, no.1, September 2010.
[5] K. Venkateswara Rao, A. Govardhan and K.V. Chalapati Rao, “Spatiotemporal Data Mining: Issues, Tasks and
Applications”, International Journal of Computer Science and Engineering Survey
(IJCSES) Vol.3, No.1, February 2012.
[6] Xiaobai Yao, “Research Issues in Spatio-temporal
Data Mining”,University Consortium for Geographic
Information Science (UCGIS) workshop on Geospatial Visualization and Knowledge
Discovery, Lansdowne, Virginia, Nov. 18-20, 2003.
[7] Shivanajay Marwaha, Jadwiga Indulska, Marius Portmann,
“Challenges and Recent Advances in QoS Provisioning
in Wireless Mesh Networks”, School of Information Technology and Electrical
Engineering, University of Queensland and National ICT Australia (NICTA)
Queensland Research Laboratory (QRL), Brisbane, Australia,
978-1-4244-2358-3/2008 IEEE, pp 618-623.
[8] Seapahn Meguerdichian, Farinaz Koushanfar, Miodrag Potkonjak and Mani B.
Srivastava, “Coverage Problems in Wireless Ad-hoc Sensor Networks”, Computer
Science Department, University of California, Los Angeles, Rockwell Science
Center (RSC) and DARPA.
[9] Zoran S. Bojkovic, Bojan M. Bakmaz, and Miodrag R. Bakmaz, “Security
Issues in Wireless Sensor Networks”, International Journal of Communications
Issue 1, Volume 2, 2008.
[10] J. Han and M. Kamber, Data Mining: Concepts and Techniques, second ed.
Morgan Kaufmann Publishers, 2006.
[11] Gourav Sahni and Sonia Sharma, “Study of Various Anomalies and
Anomaly Detection Methodologies in Wireless Sensor Network”, International Journal of Advanced Research in
Computer Science and Software Engineering,
Volume 3, Issue 5, May 2013 ISSN: 2277 128X pp 700-703.
[12] Lambodar Jena, Ramakrushna Swain, Narendra K. Kamila, “Mining Wireless Sensor Network Data: an adaptive
approach based on artificial neural networks algorithm”, IJCCT Vol.1 Issue 2,
3, 4; 2010 for International Conference [ACCTA-2010].pp 347-353
[13] M.
Birattari, G. Bontempi, and
H. Bersini. Lazy learning meets the recursive least-squares algorithm. In M.
S. Kearns, S. A. Solla, and D. A. Cohn, editors,NIPS 11, pages
375–381,Cambridge, 1999. MIT Press.
[14] Hong Shua, Xinyan Zhu, Shangping Dai, “Mining Association Rules in Geographical Spatio-Temporal Data”, The
International Archives of the Photogrammetry, Remote
Sensing and Spatial Information Sciences. Vol. XXXVII. Part B2. Beijing 2008.
[15] Jun-Wook Lee, and Yong-Joon Lee, “A
Knowledge Discovery Framework for Spatiotemporal Data Mining”, International
Journal of Information Processing Systems, Vol.2, No.2, June 2006, pp-124 129.
[16] Bhawana Parbat, R. K. Dhuware,
“Comparative Study of Classification Techniques with Labeled Data in Wireless
Sensor Network”, International Journal of Computer Applications (0975 – 8887),
Volume 69– No.11, May 2013
Received on
17.03.2014 Modified on 18.04.2014
Accepted on
03.05.2014 ©A&V Publications All right reserved
Research
J. Science and Tech. 6(2): April-
June 2014; Page 79-86